Skip to content

feat(scanner): Issue 3 - False-positive noise filter 구현#14

Merged
pureliture merged 2 commits into
mainfrom
antigravity/issue-3-noise-filter
Jun 12, 2026
Merged

feat(scanner): Issue 3 - False-positive noise filter 구현#14
pureliture merged 2 commits into
mainfrom
antigravity/issue-3-noise-filter

Conversation

@pureliture

Copy link
Copy Markdown
Contributor

Purpose & Motivation

Resolves #3.

Gitleaks가 탐지하는 결과 중 템플릿 placeholder, 더미 값, 반복 문자, 저엔트로피 문자열 등 명백한 false-positive 노이즈를 LLM verifier 이전에 필터링합니다. 목적은 storage/write volume과 verifier 비용을 줄이고, report/evaluate 단계의 신호 품질을 높이는 것입니다.

Context

  • docs/views/research-and-technical-decisions.md에 parser 단계 noise filter 결정을 문서화했습니다.
  • src/security_scanner/scanners/gitleaks/filter.py에 noise classifier를 추가했습니다.
  • parse_gitleaks_report()에서 map_gitleaks_item() 호출 전 raw Gitleaks item을 필터링합니다.
  • ScanOptions.enable_noise_filter를 추가하고 manifest scan.enable_noise_filter에서 제어할 수 있게 했습니다.
  • GitleaksScanner.scan()scan_options를 parser까지 전달하도록 연결했습니다.
  • debug log에는 secret 값을 남기지 않고 rule/reason만 기록합니다.

Note

리뷰 시 특히 아래를 봐주세요.

  • parser-level filtering이 raw evidence/store/verifier 흐름과 맞는지
  • false-negative prevention pattern이 public-safe synthetic token shape만 사용하면서도 너무 permissive하지 않은지
  • enable_noise_filter=False 경로가 parser/scanner/manifest에서 모두 작동하는지

검증:

  • uv run pytest366 passed
  • git diff --check → clean
  • cr review --agent -t committed --base origin/main → minor test coverage finding 반영 완료

Dependency

Checklist

  • 이 PR에 포함된 Commit에는 Secret Value가 포함되지 않았음을 확인했습니다.

Resolves #3.

Co-Authored-By: Codex GPT-5 <noreply@openai.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a parser-level noise filter for Gitleaks findings to filter out low-signal candidates (such as template placeholders, known dummy values, repeated characters, and low-entropy short strings) before storage or verification. This feature is controlled by a new configuration option, enable_noise_filter (defaulting to True), which is integrated into the scan options and manifest parsing. The review feedback suggests optimizing regex matching performance by merging multiple individual patterns for template placeholders and false-negative prevention into single patterns using alternation. Additionally, the reviewer recommends adding defensive type checks for the Secret field to prevent potential runtime errors if the parsed value is not a string.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/security_scanner/scanners/gitleaks/filter.py Outdated
Comment thread src/security_scanner/scanners/gitleaks/filter.py Outdated
Comment thread src/security_scanner/scanners/gitleaks/filter.py
Comment thread src/security_scanner/scanners/gitleaks/filter.py Outdated
Comment thread src/security_scanner/scanners/gitleaks/filter.py Outdated
Resolve PR review feedback for the Gitleaks noise filter by combining repeated regex checks into single compiled patterns and handling non-string Secret values defensively.

Co-Authored-By: Codex GPT-5 <noreply@openai.com>
@pureliture pureliture merged commit 4a76cb4 into main Jun 12, 2026
2 checks passed
@pureliture pureliture deleted the antigravity/issue-3-noise-filter branch June 16, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

False-positive 노이즈 필터(placeholder, low-entropy 값)의 소유 레이어 결정

1 participant